Automated Deep Lexical Acquisition for Robust Open Texts Processing
نویسندگان
چکیده
In this paper, we report on methods to detect and repair lexical errors for deep grammars. The lack of coverage has for long been the major problem for deep processing. The existence of various errors in the hand-crafted large grammars prevents their usage in real applications. The manual detection and repair of errors requires a significant amount of human effort. An experiment with the British National Corpus shows about 70% of the sentences contain unknown word(s) for the English Resource Grammar (ERG; (Copestake and Flickinger, 2000)). With the help of error mining methods, many lexical errors are discovered, which cause a large part of the parsing failures. Moreover, with a lexical type predictor based on a maximum entropy model, new lexical entries are automatically generated. The contribution of various features for the model are evaluated. With the disambiguated full parsing results, the precision of the predictor is enhanced significantly.
منابع مشابه
Robust deep linguistic processing
This dissertation deals with the robustness problem of deep linguistic processing. Hand-crafted deep linguistic grammars provide precise modeling of human languages, but are deficient in their capability of handling ill-formed or extra-grammatical inputs. In this dissertation, we argue that with a series of robust processing techniques, improved coverage can be achieved without sacrificing effi...
متن کاملAcquiring Lexical Knowledge from Text: A Case Study
Language acquisition addresses two important text processing issues. The immediate problem is understanding a text in spite of the existence of lexical gaps. The long term issue is that the understander must incorporate new words into its lexicon for future use. This paper describes an approach to constructing new lexical entries in a gradual process by analyzing a sequence of example texts. Th...
متن کاملCombining NLP and statistical techniques for lexical acquisition
The growing availability of large on-line corpora encourages the study of word behaviour directly from accessible raw texts. However the methods by which lexical knowledge should be extracted from plain texts are still matter of debate and experimentation. In this paper it is presented an integrated tool for lexical acquisition from corpora, ARIOSTO, based on a hybrid methodology that combines ...
متن کاملAutomated Acquisition of Multiword Expressions for Robust Deep Parsing
In this presentation, I mainly deal with automated acquisition of Multiword Expressions as a means of enhancing robustness of lexicalised grammars used in robust deep parsing for real-life applications. Specifically, I begin by taking a closer look at the linguistic properties of MWEs, in particular, their lexical, syntactic, as well as semantic characteristics. The term Multiword Expressions h...
متن کاملDeveloping a Semantic Similarity Judgment Test for Persian Action Verbs and Non-action Nouns in Patients With Brain Injury and Determining its Content Validity
Objective: Brain trauma evidences suggest that the two grammatical categories of noun and verb are processed in different regions of the brain due to differences in the complexity of grammatical and semantic information processing. Studies have shown that the verbs belonging to different semantic categories lead to neural activity in different areas of the brain, and action verb processing is r...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006